Search CORE

109 research outputs found

Graph algorithms for predicting subcellular localization at the pathway level

Author: Gitter Anthony
Magnano Chris S.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 12/12/2022
Field of study

Protein subcellular localization is an important factor in normal cellular processes and disease. While many protein localization resources treat it as static, protein localization is dynamic and heavily influenced by biological context. Biological pathways are graphs that represent a specific biological context and can be inferred from large-scale data. We develop graph algorithms to predict the localization of all interactions in a biological pathway as an edge-labeling task. We compare a variety of models including graph neural networks, probabilistic graphical models, and discriminative classifiers for predicting localization annotations from curated pathway databases. We also perform a case study where we construct biological pathways and predict localizations of human fibroblasts undergoing viral infection. Pathway localization prediction is a promising approach for integrating publicly available localization data into the analysis of large-scale biological data.Comment: 35 pages, 14 figure

arXiv.org e-Print Archive

Bayes Optimal Informer Sets for Early-Stage Drug Discovery

Author: Ericksen Spencer S.
Gitter Anthony
Newton Michael A.
Yu Peng
Publication venue
Publication date: 11/11/2020
Field of study

An important experimental design problem in early-stage drug discovery is how to prioritize available compounds for testing when very little is known about the target protein. Informer based ranking (IBR) methods address the prioritization problem when the compounds have provided bioactivity data on other potentially relevant targets. An IBR method selects an informer set of compounds, and then prioritizes the remaining compounds on the basis of new bioactivity experiments performed with the informer set on the target. We formalize the problem as a two-stage decision problem and introduce the Bayes Optimal Informer SEt (BOISE) method for its solution. BOISE leverages a flexible model of the initial bioactivity data, a relevant loss function, and effective computational schemes to resolve the two-step design problem. We evaluate BOISE and compare it to other IBR strategies in two retrospective studies, one on protein-kinase inhibition and the other on anti-cancer drug sensitivity. In both empirical settings BOISE exhibits better predictive performance than available methods. It also behaves well with missing data, where methods that use matrix completion show worse predictive performance. We provide an R implementation of BOISE at https://github.com/wiscstatman/esdd/BOISEComment: 18 pages, 6 figure

arXiv.org e-Print Archive

Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

Author: Anandkumar Animashree
Fraenkel Ernest
Gitter Anthony
Huang Furong
Valluvan Ragupathyraj
Publication venue
Publication date: 20/09/2016
Field of study

Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind

arXiv.org e-Print Archive

eScholarship - University of California

Caltech Authors

Network-Based Interpretation of Diverse High-Throughput Datasets through the Omics Integrator Software Package

Author: Fraenkel Ernest
Gitter Anthony
Gosline Sara Calafell
Kedaigle Amanda Joy
Soltis Anthony Robert
Tuncbag Nurcan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/07/2015
Field of study

High-throughput, ‘omic’ methods provide sensitive measures of biological responses to perturbations. However, inherent biases in high-throughput assays make it difficult to interpret experiments in which more than one type of data is collected. In this work, we introduce Omics Integrator, a software package that takes a variety of ‘omic’ data as input and identifies putative underlying molecular pathways. The approach applies advanced network optimization algorithms to a network of thousands of molecular interactions to find high-confidence, interpretable subnetworks that best explain the data. These subnetworks connect changes observed in gene expression, protein abundance or other global assays to proteins that may not have been measured in the screens due to inherent bias or noise in measurement. This approach reveals unannotated molecular pathways that would not be detectable by searching pathway databases. Omics Integrator also provides an elegant framework to incorporate not only positive data, but also negative evidence. Incorporating negative evidence allows Omics Integrator to avoid unexpressed genes and avoid being biased toward highly-studied hub proteins, except when they are strongly implicated by the data. The software is comprised of two individual tools, Garnet and Forest, that can be run together or independently to allow a user to perform advanced integration of multiple types of high-throughput data as well as create condition-specific subnetworks of protein interactions that best connect the observed changes in various datasets. It is available at http://fraenkel.mit.edu/omicsintegrator and on GitHub at https://github.com/fraenkel-lab/OmicsIntegrator.National Institutes of Health (U.S.) (grant U54CA112967)National Institutes of Health (U.S.) (grant U01CA184898)National Institutes of Health (U.S.) (grant U54NS091046)National Institutes of Health (U.S.) (grant R01GM089903

DSpace@MIT

Sharing Information to Reconstruct Patient-Specific Pathways in Heterogeneous Diseases

Author: Baldassi Carlo
Borgs Christian
Braunstein Alfredo
Chayes Jennifer
Fraenkel Ernest
Gitter Anthony
Gitty Anthony
Pagnani Andrea
Zecchina Riccardo
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2014
Field of study

Advances in experimental techniques resulted in abundant genomic, transcriptomic, epigenomic, and proteomic data that have the potential to reveal critical drivers of human diseases. Complementary algorithmic developments enable researchers to map these data onto protein-protein interaction networks and infer which signaling pathways are perturbed by a disease. Despite this progress, integrating data across different biological samples or patients remains a substantial challenge because samples from the same disease can be extremely heterogeneous. Somatic mutations in cancer are an infamous example of this heterogeneity. Although the same signaling pathways may be disrupted in a cancer patient cohort, the distribution of mutations is long-tailed, and many driver mutations may only be detected in a small fraction of patients. We developed a computational approach to account for heterogeneous data when inferring signaling pathways by sharing information across the samples. Our technique builds upon the prize-collecting Steiner forest problem, a network optimization algorithm that extracts pathways from a protein-protein interaction network. We recover signaling pathways that are similar across all samples yet still reflect the unique characteristics of each biological sample. Leveraging data from related tumors improves our ability to recover the disrupted pathways and reveals patient-specific pathway perturbations in breast cancer.United States. Army Research Office (Institute for Collaborative Biotechnologies Grant W911NF-09-0001)National Institutes of Health (U.S.) (Grant U54-CA112967)Future & Emerging Technologies (Program) (Open Grant 265496)European Research Council (Grant 267915

CiteSeerX

DSpace@MIT

Crossref

Archivio istituzionale della Ricerca - Bocconi

PubMed Central

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Backup in gene regulatory networks explains differences between binding and knockout results

Author: Bar-Joseph Ziv
Fornes Oriol
Gitter Anthony
Klutstein Michael
Oliva Baldo
Siegfried Zehava
Simon Itamar
Publication venue: Nature Publishing Group
Publication date: 01/01/2009
Field of study

The complementarity of gene expression and protein–DNA interaction data led to several successful models of biological systems. However, recent studies in multiple species raise doubts about the relationship between these two datasets. These studies show that the overwhelming majority of genes bound by a particular transcription factor (TF) are not affected when that factor is knocked out. Here, we show that this surprising result can be partially explained by considering the broader cellular context in which TFs operate. Factors whose functions are not backed up by redundant paralogs show a fourfold increase in the agreement between their bound targets and the expression levels of those targets. In addition, we show that incorporating protein interaction networks provides physical explanations for knockout effects. New double knockout experiments support our conclusions. Our results highlight the robustness provided by redundant TFs and indicate that in the context of diverse cellular systems, binding is still largely functional

Crossref

PubMed Central

UPF Digital Repository

Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

Author: Anandkumar Animashree
Fraenkel Ernest
Gitter Anthony
Huang Furong
Valluvan Ragupathyraj
Publication venue
Publication date: 20/09/2016
Field of study

An Open-Publishing Response to the COVID-19 Infodemic

Author: Boca Simina M.
Gitter Anthony
Greene Casey S.
Himmelstein Daniel S.
McGowan Lucy D’Agostino
Rando Halie M.
Robson Michael P.
Rubinetti Vincent
Velazquez Ryan
Publication venue: Smith ScholarWorks
Publication date: 01/01/2021
Field of study

The COVID-19 pandemic catalyzed the rapid dissemination of papers and preprints investigating the disease and its associated virus, SARS-CoV-2. The multifaceted nature of COVID-19 demands a multidisciplinary approach, but the urgency of the crisis combined with the need for social distancing measures present unique challenges to collaborative science. We applied a massive online open publishing approach to this problem using Manubot. Through GitHub, collaborators summarized and critiqued COVID-19 literature, creating a review manuscript. Manubot automatically compiled citation information for referenced preprints, journal publications, websites, and clinical trials. Continuous integration workflows retrieved up-to-date data from online sources nightly, regenerating some of the manuscript\u27s figures and statistics. Manubot rendered the manuscript into PDF, HTML, LaTeX, and DOCX outputs, immediately updating the version available online upon the integration of new content. Through this effort, we organized over 50 scientists from a range of backgrounds who evaluated over 1,500 sources and developed seven literature reviews. While many efforts from the computational community have focused on mining COVID-19 literature, our project illustrates the power of open publishing to organize both technical and non-technical scientists to aggregate and disseminate information in response to an evolving crisis

PubMed Central

Smith College: Smith ScholarWorks

Not a Waste: Wastewater Surveillance to Enhance Public Health

Author: Chavarria Carlos A.
Gitter Anna
Godbole Anuja Rajendra
Hanson Blake M.
Hu Tao
Maresso Anthony W.
Mena Kristina D.
Monserrat Carlos
Oghuan Jeremiah
Wang Yun
Wu Fuqing
Publication venue: Chapman University Digital Commons
Publication date: 09/01/2023
Field of study

Domestic wastewater, when collected and evaluated appropriately, can provide valuable health-related information for a community. As a relatively unbiased and non-invasive approach, wastewater surveillance may complement current practices towards mitigating risks and protecting population health. Spurred by the COVID-19 pandemic, wastewater programs are now widely implemented to monitor viral infection trends in sewersheds and inform public health decision-making. This review summarizes recent developments in wastewater-based epidemiology for detecting and monitoring communicable infectious diseases, dissemination of antimicrobial resistance, and illicit drug consumption. Wastewater surveillance, a quickly advancing Frontier in environmental science, is becoming a new tool to enhance public health, improve disease prevention, and respond to future epidemics and pandemics

Chapman University Digital Commons

Discovering pathways by orienting edges in protein interaction networks

Author: Amon
Anthony Gitter
Anupam Gupta
Aranda
Archambault
Bar-Joseph
Bebek
Bertsimas
Blondel
Brass
Charikar
Chatr-aryamontri
Cobb
Corbeil
Covert
Cox
Deng
Dirick
Ernst
Ewing
Fischer
Folch-Mallol
Fu
Gavin
Geymonat
Halperin
Harbison
Hu
Håstad
Judith Klein-Seetharaman
Kaiser
Kanehisa
Kohli
Krogan
König
Lewis
Liu
Lu
Maeder
Margolin
Medvedovsky
Metodiev
Mikkelsen
Ourfali
Piloto
Schlaepfer
Scott
Segal
Stark
Steffen
Tang
von Mering
Wu
Xie
Yeang
Yosef
Zarrinpar
Zhao
Ziv Bar-Joseph
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Modern experimental technology enables the identification of the sensory proteins that interact with the cells’ environment or various pathogens. Expression and knockdown studies can determine the downstream effects of these interactions. However, when attempting to reconstruct the signaling networks and pathways between these sources and targets, one faces a substantial challenge. Although pathways are directed, high-throughput protein interaction data are undirected. In order to utilize the available data, we need methods that can orient protein interaction edges and discover high-confidence pathways that explain the observed experimental outcomes. We formalize the orientation problem in weighted protein interaction graphs as an optimization problem and present three approximation algorithms based on either weighted Boolean satisfiability solvers or probabilistic assignments. We use these algorithms to identify pathways in yeast. Our approach recovers twice as many known signaling cascades as a recent unoriented signaling pathway prediction technique and over 13 times as many as an existing network orientation algorithm. The discovered paths match several known signaling pathways and suggest new mechanisms that are not currently present in signaling databases. For some pathways, including the pheromone signaling pathway and the high-osmolarity glycerol pathway, our method suggests interesting and novel components that extend current annotations

CiteSeerX

Crossref

PubMed Central

Juelich Shared Electronic Resources